Integrating Approximate Summarization with Provenance Capture

نویسندگان

Seokki Lee

Xing Niu

Bertram Ludäscher

Boris Glavic

چکیده

How to use provenance to explain why a query returns a result or why a result is missing has been studied extensively. Recently, we have demonstrated how to uniformly answer these types of provenance questions for first-order queries with negation and have presented an implementation of this approach in our PUG (Provenance Unification through Graphs) system. However, for realisticallysized databases, the provenance of answers and missing answers can be very large, overwhelming the user with too much information and wasting computational resources. In this paper, we introduce an (approximate) summarization technique that generates compact representations of why and why-not provenance. Our technique uses patterns as a summarized representation of sets of elements from the provenance, i.e., successful or failed derivations. We rank these patterns based on their descriptiveness (we use precision and recall as quality measures for patterns) and return only the top-k summaries. We demonstrate how this summarization technique can be integrated with provenance capture to compute summaries on demand and how sampling techniques can be employed to speed up both the summarization and capture steps. Our preliminary experiments demonstrate that this summarization technique scales to large instances of a real-world dataset.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

WING-NUS at CL-SciSumm 2017: Learning from Syntactic and Semantic Similarity for Citation Contextualization

We present here system report for our model submitted for shared task on Computational Linguistic Scientific-document Summarization (CL-SciSumm) 2017. We hypothesize that search and retrieval based techniques are sub-optimal for learning complex relation likes provenance. State-of-the-art information retrieval techniques using term frequency inverted document frequency (TF-IDF) to capture surfa...

متن کامل

Toward Provenance Capturing as Cross-Cutting Concern

Although provenance gained much attention, solutions to capture provenance do not meet all the requirements. For instance, most solution currently assume a closed world and are explicitly designed to capture provenance. Thus, they fail in integrating the provenance concern into existing environments. Hence, we argue that provenance should be considered as cross-cutting concern that can easily b...

متن کامل

SGProv: Summarization Mechanism for Multiple Provenance Graphs

Scientific workflow management systems (SWfMS) are powerful tools in the automation of scientific experiments. Several workflow executions are necessary to accomplish one scientific experiment. Data provenance, typically collected by SWfMS during workflow execution, is important to understand, reproduce and analyze scientific experiments. Provenance is about data derivation, thus it is typicall...

متن کامل

Syntactic Query Models for Restatement Retrieval

We consider the problem of retrieving sentence level restatements. Formally, we define restatements as sentences that contain all or some subset of information present in a query sentence. Identifying restatements is useful for several applications such as multi-document summarization, document provenance, text reuse and novelty detection. Spurious partial matches and term dependence become imp...

متن کامل

A Role for Provenance in Social Computation

We argue that existing systems to support social computation suffer from a lack of transparency and that this can be addressed by integrating provenance capture mechanisms into such systems. We discuss how Semantic Web technologies can be used to facilitate this, and how the provenance record could be used to support various forms of decision-making about tasks such as workforce selection.

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Integrating Approximate Summarization with Provenance Capture

نویسندگان

چکیده

منابع مشابه

WING-NUS at CL-SciSumm 2017: Learning from Syntactic and Semantic Similarity for Citation Contextualization

Toward Provenance Capturing as Cross-Cutting Concern

SGProv: Summarization Mechanism for Multiple Provenance Graphs

Syntactic Query Models for Restatement Retrieval

A Role for Provenance in Social Computation

عنوان ژورنال:

اشتراک گذاری